The matrix-based R\'enyi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive in large-scale applications. To address these issues, we propose a novel measure of information, termed low-rank matrix-based R\'enyi's entropy, based on low-rank representations of infinitely divisible kernel matrices. The proposed entropy functional inherits the specialty of of the original definition to directly quantify information from data, but enjoys additional advantages including robustness and effective calculation. Specifically, our low-rank variant is more sensitive to informative perturbations induced by changes in underlying distributions, while being insensitive to uninformative ones caused by noises. Moreover, low-rank R\'enyi's entropy can be efficiently approximated by random projection and Lanczos iteration techniques, reducing the overall complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2 s)$ or even $\mathcal{O}(ns^2)$, where $n$ is the number of data samples and $s \ll n$. We conduct large-scale experiments to evaluate the effectiveness of this new information measure, demonstrating superior results compared to matrix-based R\'enyi's entropy in terms of both performance and computational efficiency.
translated by 谷歌翻译
零射击跨模式检索(ZS-CMR)处理了来自看不见类别的异源数据之间的检索问题。通常,为了确保概括,使用自然语言处理(NLP)模型的预定义类嵌入方式用于构建公共空间。在本文中,我们考虑了一种完全不同的方法来从信息理论的角度考虑构造(或学习)通用锤击空间的完全不同的方法,而不是使用额外的NLP模型来定义公共空间。我们将模型称为信息理论哈希(ITH),该图案由两个级联模块组成:一个自适应信息聚合(AIA)模块;和语义保存编码(SPE)模块。具体而言,我们的AIA模块从相关信息的原理(PRI)中汲取灵感来构建一个共同空间,该空间可适应地汇总了不同数据模式的固有语义,并滤除了多余或无关紧要的信息。另一方面,我们的SPE模块通过保留固有语义与元素的Kullback-Leibler(KL)差异的相似性,进一步生成了不同模态的哈希代码。还施加了总相关性项,以减少哈希码不同维度之间的冗余。在三个基准数据集上进行了足够的实验,证明了ZS-CMR中提出的ITH的优势。源代码在补充材料中可用。
translated by 谷歌翻译
最近开发的基于矩阵的renyi的熵能够通过在再现内核Hilbert空间中的对称正半明确(PSD)矩阵中的EigensPectrum,而无需估计基础数据分布的情况下,能够测量数据中的信息。这种有趣的属性使得新信息测量在多种统计推理和学习任务中广泛采用。然而,这种数量的计算涉及PSD矩阵$ G $的跟踪运算符,以便为电源$ \ alpha $(即$ tr(g ^ \ alpha)$),具有近O $ o的正常复杂性(n ^ 3 )$,当样品数量(即$ N $)大时,严重妨碍了它的实际用法。在这项工作中,我们向这种新的熵功能呈现计算有效的近似,这可以降低其复杂性,以明显不到$ O(n ^ 2)$。为此,我们首先将随机近似为$ \ tr(\ g ^ \ alpha)$,将跟踪估计转换为矩阵矢量乘法问题。我们扩展了$ \ Alpha $(整数或非整数)的任意值策略。然后,我们建立基于矩阵的renyi的熵和PSD矩阵近似之间的连接,这使我们能够利用群集和阻止$ \ g $的低级结构来进一步降低计算成本。理论上我们提供近似精度保证并说明不同近似的属性。综合性和现实数据的大规模实验评估证实了我们的理论发现,展示了有希望的加速,准确性可忽略不计。
translated by 谷歌翻译
特征表示的相似性在与域适应有关的问题的成功中起着枢转作用。特征相似性包括边际分布的不变性以及给定所需响应$ Y $(例如,类标签)的条件分布的闭合性。不幸的是,传统方法始终学习此类功能,而无需完全考虑到$ Y $以$ y $以$ y $考虑到信息,这又可能导致条件分布的不匹配或歧视结构的歧视结构的混合。在这项工作中,我们介绍了最近提出的冯Neumann有条件分歧,以提高多个域的可转移。我们表明,这种新的分歧是可差异的,并且有资格容易地量化功能与$ y $之间的功能依赖性。给定多个源任务时,我们将这种分歧整合到捕获$ y $,并且设计新颖的学习目标,假设这些源任务同时或顺序观察。在这两种情况下,我们在新任务的较小概括误差方面获得了对最先进的方法的有利性能,以及在源任务上丢失的灾难性遗忘的较少(在顺序设置中)。
translated by 谷歌翻译
以前的研究,将一般神经计算机翻译(NMT)模型调整为特定域通常忽略同一域内的翻译中的分集,这是真实情景中域适应的核心问题。这种具有挑战性的情景的一个代表是部署与特定主题的会议的翻译系统,例如全球变暖或冠状病毒,因为时间表通常存在极低的资源。为了激励在这种情况下更广泛的调查,我们在机器翻译(Flgada)中展示了一个真实的细粒度域适应任务。 Flgada DataSet由汉英翻译任务组成,用于信息技术的四个子域:自治车辆,AI教育,实时网络和智能手机。每个子域都配备有开发集和测试集以进行评估目的。为了更接近现实,Flgada不采用任何域名双语培训数据,但提供双语词典和Wiki知识库,这可以在短时间内更容易获得。我们基准于细粒度的域适应任务,并显示深入的分析,表明存在仍然有挑战性的问题,以进一步提高异构资源的性能。
translated by 谷歌翻译
Frost damage is one of the main factors leading to wheat yield reduction. Therefore, the detection of wheat frost accurately and efficiently is beneficial for growers to take corresponding measures in time to reduce economic loss. To detect the wheat frost, in this paper we create a hyperspectral wheat frost data set by collecting the data characterized by temperature, wheat yield, and hyperspectral information provided by the handheld hyperspectral spectrometer. However, due to the imbalance of data, that is, the number of healthy samples is much higher than the number of frost damage samples, a deep learning algorithm tends to predict biasedly towards the healthy samples resulting in model overfitting of the healthy samples. Therefore, we propose a method based on deep cost-sensitive learning, which uses a one-dimensional convolutional neural network as the basic framework and incorporates cost-sensitive learning with fixed factors and adjustment factors into the loss function to train the network. Meanwhile, the accuracy and score are used as evaluation metrics. Experimental results show that the detection accuracy and the score reached 0.943 and 0.623 respectively, this demonstration shows that this method not only ensures the overall accuracy but also effectively improves the detection rate of frost samples.
translated by 谷歌翻译
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.
translated by 谷歌翻译
Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preference for factual consistency, and the preference for the language or knowledge prior as well. To separate the preference for factual consistency, we propose an unsupervised framework named CoP by controlling the preference of the generation model with the help of prompt. More specifically, the framework performs an extra inference step in which a text prompt is introduced as an additional input. In this way, another preference is described by the generation probability of this extra inference process. The difference between the above two preferences, i.e. the difference between the probabilities, could be used as measurements for detecting factual inconsistencies. Interestingly, we found that with the properly designed prompt, our framework could evaluate specific preferences and serve as measurements for fine-grained categories of inconsistency, such as entity-related inconsistency, coreference-related inconsistency, etc. Moreover, our framework could also be extended to the supervised setting to learn better prompt from the labeled data as well. Experiments show that our framework achieves new SOTA results on three factual inconsistency detection tasks.
translated by 谷歌翻译
Recently, non-autoregressive (NAR) neural machine translation models have received increasing attention due to their efficient parallel decoding. However, the probabilistic framework of NAR models necessitates conditional independence assumption on target sequences, falling short of characterizing human language data. This drawback results in less informative learning signals for NAR models under conventional MLE training, thereby yielding unsatisfactory accuracy compared to their autoregressive (AR) counterparts. In this paper, we propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals. During training stage, we introduce a set of sufficiently weak AR decoders that solely rely on the information provided by NAR decoder to make prediction, forcing the NAR decoder to become stronger or else it will be unable to support its weak AR partners. Experiments on WMT and IWSLT datasets show that our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
translated by 谷歌翻译
Recently, neural network based methods have shown their power in learning more expressive features on the task of knowledge graph embedding (KGE). However, the performance of deep methods often falls behind the shallow ones on simple graphs. One possible reason is that deep models are difficult to train, while shallow models might suffice for accurately representing the structure of the simple KGs. In this paper, we propose a neural network based model, named DeepE, to address the problem, which stacks multiple building blocks to predict the tail entity based on the head entity and the relation. Each building block is an addition of a linear and a non-linear function. The stacked building blocks are equivalent to a group of learning functions with different non-linear depth. Hence, DeepE allows deep functions to learn deep features, and shallow functions to learn shallow features. Through extensive experiments, we find DeepE outperforms other state-of-the-art baseline methods. A major advantage of DeepE is the robustness. DeepE achieves a Mean Rank (MR) score that is 6%, 30%, 65% lower than the best baseline methods on FB15k-237, WN18RR and YAGO3-10. Our design makes it possible to train much deeper networks on KGE, e.g. 40 layers on FB15k-237, and without scarifying precision on simple relations.
translated by 谷歌翻译